AITopics | descent step

When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the number of measurements. In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure. This procedure involves minimizing convex loss and regularizer functions that are nonlinearly smoothed versions of the widely applied MAP optimization problem. Our findings provide a new heuristic derivation and interpretation for recent optimal M-estimators found in the setting of linear measurements and additive noise, and further extend these results to nonlinear measurements with non-additive noise. We numerically demonstrate superior performance of our optimal M-estimators relative to MAP. Overall, at the heart of our work is the revelation of a remarkable equivalence between two seemingly very different computational problems: namely that of high dimensional Bayesian integration underlying MMSE inference, and high dimensional convex optimization underlying M-estimation. In essence we show that the former difficult integral may be computed by solving the latter, simpler optimization problem.

artificial intelligence, inference, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

a548ef984f30bca3abdc09f43743827f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 03:49:55 GMT

algorithm, constraint, def, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Europe > Germany > Berlin (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

a3a7387e49f4de290c23beea2dfcdc75-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 03:11:21 GMT

descent step, dime, exp, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks

Neural Information Processing SystemsDec-25-2025, 19:02:05 GMT

Heterogeneous graph neural networks (GNNs) achieve strong performance on node classification tasks in a semi-supervised learning setting. However, as in the simpler homogeneous GNN case, message-passing-based heterogeneous GNNs may struggle to balance between resisting the oversmoothing that may occur in deep models, and capturing long-range dependencies of graph structured data. Moreover, the complexity of this trade-off is compounded in the heterogeneous graph case due to the disparate heterophily relationships between nodes of different types. To address these issues, we propose a novel heterogeneous GNN architecture in which layers are derived from optimization steps that descend a novel relation-aware energy function. The corresponding minimizer is fully differentiable with respect to the energy function parameters, such that bilevel optimization can be applied to effectively learn a functional form whose minimum provides optimal node representations for subsequent classification tasks. In particular, this methodology allows us to model diverse heterophily relationships between different node types while avoiding oversmoothing effects. Experimental results on 8 heterogeneous graph benchmarks demonstrates that our proposed method can achieve competitive node classification accuracy.

descent step, name change, produce heterogeneous graph neural network, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Supplementary Materials for Descent Steps of a Relation-A ware Energy Produce Heterogeneous Graph Neural Networks

Neural Information Processing SystemsAug-19-2025, 21:33:07 GMT

X)vec (Y) (2) We now proceed with the proof of our result. Work completed during an internship at the A WS Shanghai AI Lab. Note that we apply Roth's column lemma to (11) to derive (12). GNN layers with 16 hidden dimensions. Table 1: Results using different base models (left) and test time comparisons (right).

artificial intelligence, machine learning, vec, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.24)
North America > United States > Michigan (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Fast Algorithms for Packing Proportional Fairness and its Dual

Neural Information Processing SystemsAug-17-2025, 09:24:58 GMT

The assignment of bounded resources to several agents under some notions of fairness is a topic studied in networking, operations research, game theory, and economic theory.

algorithm, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Europe > Germany > Berlin (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

a3a7387e49f4de290c23beea2dfcdc75-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 08:53:16 GMT

artificial intelligence, dime, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Luo, Xinyu, Bai, Cedar Site, Li, Bolian, Drineas, Petros, Zhang, Ruqi, Bullins, Brian

arXiv.org Artificial IntelligenceJun-11-2025

While popular optimization methods such as SGD, AdamW, and Lion depend on steepest descent updates in either $\ell_2$ or $\ell_\infty$ norms, there remains a critical gap in handling the non-Euclidean structure observed in modern deep networks training. In this work, we address this need by introducing a new accelerated $\ell_p$ steepest descent algorithm, called Stacey, which uses interpolated primal-dual iterate sequences to effectively navigate non-Euclidean smooth optimization tasks. In addition to providing novel theoretical guarantees for the foundations of our algorithm, we empirically compare our approach against these popular methods on tasks including image classification and language model (LLM) pretraining, demonstrating both faster convergence and higher final accuracy. We further evaluate different values of $p$ across various models and datasets, underscoring the importance and efficiency of non-Euclidean approaches over standard Euclidean methods. Code can be found at https://github.com/xinyuluo8561/Stacey .

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.06606

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Blended Conditional Gradients: the unconditioning of conditional gradients

Braun, Gábor, Pokutta, Sebastian, Tu, Dan, Wright, Stephen

arXiv.org Artificial IntelligenceMar-21-2025

We present a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm (also called conditional gradient) with gradient-based steps, different from away steps and pairwise steps, but still achieving linear convergence for strongly convex functions, along with good practical performance. Our approach retains all favorable properties of conditional gradient algorithms, notably avoidance of projections onto P and maintenance of iterates as sparse convex combinations of a limited number of extreme points of P. The algorithm is lazy, making use of inexpensive inexact solutions of the linear programming subproblem that characterizes the conditional gradient approach. It decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the lazy conditional gradient algorithms of [arXiv:1410.8816]. We also present a streamlined version of the algorithm for the probability simplex.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1805.07311

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks

Neural Information Processing SystemsJan-19-2025, 08:09:01 GMT

Heterogeneous graph neural networks (GNNs) achieve strong performance on node classification tasks in a semi-supervised learning setting. However, as in the simpler homogeneous GNN case, message-passing-based heterogeneous GNNs may struggle to balance between resisting the oversmoothing that may occur in deep models, and capturing long-range dependencies of graph structured data. Moreover, the complexity of this trade-off is compounded in the heterogeneous graph case due to the disparate heterophily relationships between nodes of different types. To address these issues, we propose a novel heterogeneous GNN architecture in which layers are derived from optimization steps that descend a novel relation-aware energy function. The corresponding minimizer is fully differentiable with respect to the energy function parameters, such that bilevel optimization can be applied to effectively learn a functional form whose minimum provides optimal node representations for subsequent classification tasks. In particular, this methodology allows us to model diverse heterophily relationships between different node types while avoiding oversmoothing effects.

descent step, heterophily relationship, produce heterogeneous graph neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

Filters

Collaborating Authors

descent step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

An equivalence between high dimensional Bayes optimal inference and M-estimation

a548ef984f30bca3abdc09f43743827f-Paper-Conference.pdf

a3a7387e49f4de290c23beea2dfcdc75-Supplemental-Conference.pdf

Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks

Supplementary Materials for Descent Steps of a Relation-A ware Energy Produce Heterogeneous Graph Neural Networks

Fast Algorithms for Packing Proportional Fairness and its Dual

a3a7387e49f4de290c23beea2dfcdc75-Supplemental-Conference.pdf

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Blended Conditional Gradients: the unconditioning of conditional gradients

Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks